Digitization Errors In Hungarian Documents

نویسندگان

  • Máté Pataki
  • Tamás Füzessy
چکیده

Our task was to analyze a certain digitizing system, check what type of errors emerge during the process, and how these errors effect the searchability of the digitized documents. We have set up a testbed which is suitable for the automatic processing of digitized texts in a large scale. In this paper we shortly introduce the methodology of document digitization emphasizing the error-sources in the process, and sketch the results obtained from our test-system, especially the Hungarian language dependent characteristics of the emerging errors.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards the Creation of a Robust Search Index for Digitalized Documents

The simultaneous support of electronic and paper-based document handling is a natural demand of current filing and document management systems. To support the better management of search and retrieval functions and to reduce the high costs of digitizing, the Department of Distributed Systems of SZTAKI analysed the different kinds of error that emerged during the digitization process of Hungaria...

متن کامل

High-resolution video mosaicing for documents and photos by estimating camera motion

Recently, document and photograph digitization from a paper is very important for digital archiving and personal data transmission through the internet. To realize easy and high quality digitization of documents and photographs, we propose a novel digitization method that uses a movie captured by a hand-held camera. In our method, first, 6-DOF(Degree Of Freedom) position and posture parameters ...

متن کامل

Book-Adaptive and Book-Dependent Models to Accelerate Digitization of Early Music

Optical music recognition (OMR) enables early music collections to be digitized on a large scale. The workflow for such digitisation projects also includes scanning and preprocessing, but the cost of expert human labour to correct automatic recognition errors dominates the cost of these other two steps. To reduce the number of recognition errors in the OMR process, we present an innovative appl...

متن کامل

‘As We May Digitize’ — Institutions and Documents Reconfigured

This article frames digitization as a knowledge organization practice in libraries and museums. The primarily discriminatory practices of museums are compared with the non-discriminatory practices of libraries when managing their respective cultural heritage collections. Digitization of cultural heritage brings new practices, tools and arenas that reconfigure and reinterpret not only the collec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007